evaluation measure
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > United States > Washington > King County > Bellevue (0.04)
- North America > United States > Ohio (0.04)
- (2 more...)
- Information Technology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)
- Health & Medicine > Therapeutic Area > Oncology (0.67)
The Elephant in the Room: Towards A Reliable Time-Series Anomaly Detection Benchmark
Time-series anomaly detection is a fundamental task across scientific fields and industries. However, the field has long faced the ``elephant in the room:'' critical issues including flawed datasets, biased evaluation measures, and inconsistent benchmarking practices that have remained largely ignored and unaddressed. We introduce the TSB-AD to systematically tackle these issues in the following three aspects: (i) Dataset Integrity: with 1070 high-quality time series from a diverse collection of 40 datasets (doubling the size of the largest collection and four times the number of existing curated datasets), we provide the first large-scale, heterogeneous, meticulously curated dataset that combines the effort of human perception and model interpretation; (ii) Measure Reliability: by revealing issues and biases in evaluation measures, we identify the most reliable and accurate measure, namely, VUS-PR for anomaly detection in time series to address concerns from the community; and (iii) Comprehensive Benchmarking: with a broad spectrum of 40 detection algorithms, from statistical methods to the latest foundation models, we perform a comprehensive evaluation that includes a thorough hyperparameter tuning and a unified setup for a fair and reproducible comparison. Our findings challenge the conventional wisdom regarding the superiority of advanced neural network architectures, revealing that simpler architectures and statistical methods often yield better performance. The promising performance of neural networks on multivariate cases and foundation models on point anomalies highlights the need for further advancements in these methods.
TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction
Fujita, Aoi, Yamamoto, Taichi, Nakayama, Yuri, Kobayashi, Ryota
Rapid expansion of social media platforms such as X (formerly Twitter), Facebook, and Reddit has enabled large-scale analysis of public perceptions on diverse topics, including social issues, politics, natural disasters, and consumer sentiment. Topic modeling is a widely used approach for uncovering latent themes in text data, typically framed as an unsupervised classification task. However, traditional models, originally designed for longer and more formal documents, struggle with short social media posts due to limited co-occurrence statistics, fragmented semantics, inconsistent spelling, and informal language. To address these challenges, we propose a new method, TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction. Specifically, each text is embedded using Sentence-BERT (SBERT) and provisionally clustered using Gaussian Mixture Models (GMM). The clusters are then refined iteratively using a supervised projection based on linear discriminant analysis, followed by GMM-based clustering until convergence. Notably, our method operates directly on raw text, eliminating the need for preprocessing steps such as stop word removal. We evaluate our approach on four diverse datasets, 20News, AgNewsTitle, Reddit, and TweetTopic, each containing human-labeled topic information. Compared with seven baseline methods, including a recent SBERT-based method and a zero-shot generative AI method, our approach achieves the highest similarity to human-annotated topics, with significant improvements for both social media posts and online news articles. Additionally, qualitative analysis shows that our method produces more interpretable topics, highlighting its potential for applications in social media data and web content analytics.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (14 more...)
- Media > News (0.90)
- Health & Medicine (0.68)
- Leisure & Entertainment > Sports (0.68)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
The Eigenvalues Entropy as a Classifier Evaluation Measure
Classification is a machine learning method used in many practical applications: text mining, handwritten character recognition, face recognition, pattern classification, scene labeling, computer vision, natural langage processing. A classifier prediction results and training set information are often used to get a contingency table which is used to quantify the method quality through an evaluation measure. Such measure, typically a numerical value, allows to choose a suitable method among several. Many evaluation measures available in the literature are less accurate for a dataset with imbalanced classes. In this paper, the eigenvalues entropy is used as an evaluation measure for a binary or a multi-class problem. For a binary problem, relations are given between the eigenvalues and some commonly used measures, the sensitivity, the specificity, the area under the operating receiver characteristic curve and the Gini index. A by-product result of this paper is an estimate of the confusion matrix to deal with the curse of the imbalanced classes. Various data examples are used to show the better performance of the proposed evaluation measure over the gold standard measures available in the literature.
- North America > United States > New Jersey (0.04)
- Europe > Switzerland (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Toward Interpretable Evaluation Measures for Time Series Segmentation
Chavelli, Félix, Boniol, Paul, Thomazo, Michaël
Time series segmentation is a fundamental task in analyzing temporal data across various domains, from human activity recognition to energy monitoring. While numerous state-of-the-art methods have been developed to tackle this problem, the evaluation of their performance remains critically limited. Existing measures predominantly focus on change point accuracy or rely on point-based measures such as Adjusted Rand Index (ARI), which fail to capture the quality of the detected segments, ignore the nature of errors, and offer limited interpretability. In this paper, we address these shortcomings by introducing two novel evaluation measures: WARI (Weighted Adjusted Rand Index), that accounts for the position of segmentation errors, and SMS (State Matching Score), a fine-grained measure that identifies and scores four fundamental types of segmentation errors while allowing error-specific weighting. We empirically validate WARI and SMS on synthetic and real-world benchmarks, showing that they not only provide a more accurate assessment of segmentation quality but also uncover insights, such as error provenance and type, that are inaccessible with traditional measures.
- North America > United States > California (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- (8 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > United States > Washington > King County > Bellevue (0.04)
- North America > United States > Ohio (0.04)
- (2 more...)
- Information Technology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)
- Health & Medicine > Therapeutic Area > Oncology (0.67)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Canada (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Research Report > Strength High (0.94)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
MLFMF: Data Sets for Machine Learning for Mathematical Formalization Supplementary Material
We obtained the source code of the libraries from their publicly available GitHub repositories. Table 1: The versions of the four libraries transformed to the data sets. As described in the article manuscript (see Section 3.2), every data set corresponds to a library of formalized mathematics. The second part of the data set, the directed multi-graph, or network for short, is stored in a text file, listing its nodes and links. In the following subsections, we describe the file format and the structure of the repository.